Genetic association studies: time for a new paradigm?
نویسنده
چکیده
Recently, CEBP published a new editorial policy for association studies of genetic variants and disease (1). The policy gives high priority to studies that meet several criteria. The studies must (a) evaluate a variant in light of its possible interaction with other genes in a pathway and with exogenous exposures: (b) consider the biological plausibility of the reported association(s), and the strength of prior evidence; (c) provide good statistical power for the reported association(s) or lack thereof; (d) and evaluate statistical significance in light of the multiple tests conducted. These criteria have the laudable goal of promoting clarity and consistency. However, the criteria may be difficult to implement for two reasons. First, they require a balance between the need to evaluate many variants with a reasonable overall type I error rate and the loss of power associated with a stringent per-variant error rate. Second, they are qualitative, so that their interpretation can vary across authors, reviewers, and editors. One way to resolve both of these difficulties is to introduce a quantitative credibility scale that automatically integrates the criteria. The Bayesian approach of Wacholder et al. (2) addresses the first three criteria, but does not deal with test multiplicity. The frequentist false discovery rate of Benjamini and Hochberg (3) provides a solution to the test multiplicity problem, but does not consider the first three criteria. By combining the Bayesian proposal of Wacholder et al. with the frequentist proposal of Benjamini and Hochberg, we can rank associations on a credibility scale that integrates all four criteria. The ranking can be described in terms of the prior probability k that an association is true, based on its biological plausibility and on the findings of previous studies. The prior odds in favor of the association are then Oprior = k / (1 k). Let us say that the study under review has power q to detect the association (if real), and that the study data support it with P value equal to p . As shown by Wacholder et al., the posterior odds in favor of the association, given the study data, are Opost = Oprior (q / p). Notice that the posterior odds increase with increasing prior odds and study power q , and with decreasing P value p . The notion of posterior odds integrates into a single measure the concepts of biological plausibility, study power, and P value. Wacholder et al. (2) suggest that associations can be evaluated more meaningfully by considering their posterior odds in addition to their P values. However, their proposal pertains to single associations in isolation, and so does not consider multiple findings for several polymorphisms in several genes. To address this need, I suggest using the posterior odds in a Bayesian multiple testing procedure akin to the frequentist procedure proposed by Benjamini and Hochberg (3). To illustrate this idea, suppose that a study evaluates m tests of association. Each of these associations can be classified jointly as real or spurious, and as ‘‘significant’’ or ‘‘nonsignificant’’ according to some rule based on the study data. For example, a common rule is the one that classifies as significant those associations having P values less than 0.05. The frequentist false discovery rate associated with the rule is the expected fraction of significant findings that are spurious. Benjamini and Hochberg (3) proposed a simple rule (the BH rule) for declaring a subset of the m associations significant, based on a prespecified value r between 0 and 1. Remarkably, the r-based BH rule is guaranteed to have a false discovery rate below 100r%, regardless of how many, and which, of the m associations are spurious. The BH rule can be considerably more powerful than the Bonferroni correction (3). The BH rule is based only on P values, and thus does not consider either the biological plausibility, the strength of prior evidence, or the statistical power of the various associations tested. In research currently under way, I am investigating an r-based Bayesian rule that ranks the m associations with respect to the posterior odds that they are true. Like the frequentist BH rule, this Bayesian rule ensures that, on average, no more than 100r% of the significant associations are spurious. As a by-product, it also produces a meaningful ranking of the associations with respect to the posterior odds in their favor. Its implementation involves (a) enumerating the associations tested; (b) combining prior odds, power, and P value to calculate a posterior odds in favor of each association; (c) ranking the associations with respect to their posterior odds; (d) using an r-based Bayesian false discovery rate rule to select the noteworthy ones. Is it time to implement the goals outlined by Rebbeck et al. (1) in such a quantitative way? Should we rank reported associations according to some formal measure of credibility, such as their posterior odds? This measure would help readers to systematically and consistently evaluate the importance of observed associations or absences thereof, not only as evidenced by their P values but also by their plausibility, the prior evidence supporting them and the study power. No quantitative measure (including the one proposed here) can escape some limitations [see Thomas and Clayton (4) for a thoughtful discussion of these], nor can it eliminate the need for subjective judgment calls. Nevertheless, a formal ranking would make these calls and their rationale more explicit.
منابع مشابه
Single Nucleotide Polymorphisms and Association Studies: A Few Critical Points
Uncovering DNA sequence variations that correlate with phenotypic changes, e.g., diseases, is the aim of sequence variation studies. Common types sequence variations are Single nucleotide polymorphism (SNP, pronounced snip).SNPs are the third-generation molecular marker. SNP represents a DNA sequence variant of a single base pair with the minor allele occurring in more than 1% of a given popula...
متن کاملمطالعات وابستگی در بیماری های شایع غدد (مقاله مروری)
Our understanding of the pathogenesis of endocrine disorders increase rapidly by genetic studies at the molecular level. Common endocrine disorders such as diabetes mellitus, obesity, osteoporosis, dyslipidemia and cancer follow the multifactorial model in the genetic aspect. This review tries to clarify the approach in molecular studies of such diseases for clinicians in different specialties....
متن کاملP83: Role of Neuregulin 3 Genes Expression on Attention Deficits in Schizophrenia
Genetic epidemiological studies strongly suggest that additive and interactive genes, each with small effects, mediate the genetic vulnerability for schizophrenia. With the human genome working draft at hand, candidate gene (and ultimately large-scale genome-wide) association studies are gaining renewed interest in the effort to unravel the complex genetics of schizophrenia. Linkage and fine ma...
متن کاملA New ILP Model for Identical Parallel-Machine Scheduling with Family Setup Times Minimizing the Total Weighted Flow Time by a Genetic Algorithm
This paper presents a novel, integer-linear programming (ILP) model for an identical parallel-machine scheduling problem with family setup times that minimizes the total weighted flow time (TWFT). Some researchers have addressed parallel-machine scheduling problems in the literature over the last three decades. However, the existing studies have been limited to the research of independent jobs,...
متن کاملAssociation Study of rs3184504 C>T Polymorphism in Patients With Coronary Artery Disease
Cardiovascular disease has become the main factor of death and birth defects in the world and also in Iran. New clinical studies have shown that early diagnosis of patients with coronary artery disease (CAD) can contribute to effective prevention or therapeutic structures, which reduce mortality or the next chance of cardiovascular events, and increase the quality of life. Most studies on CAD d...
متن کاملGenetics of Type 2 Diabetes- A Review Article
Objective: Type 2 diabetes (T2D) as a complex disease is the result of genetically heterogeneous factors and environmental issues interaction. Linkage and small-scale candidate gene studies were successful in identification of genetic susceptibilities of monogenic form of diseases. However, they were largely unsuccessful while applying to the more common forms of disease. By designing Genome Wi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology
دوره 14 6 شماره
صفحات -
تاریخ انتشار 2005